Semi-Markov Adaptive Critic Heuristics with Application to Airline Revenue Management
نویسندگان
چکیده
The adaptive critic heuristic has been a popular algorithm in reinforcement learning (RL) and approximate dynamic programming (ADP) alike. It is one of the first RL and ADP algorithms. RL and ADP algorithms are particularly useful for solving Markov decision processes (MDPs) that suffer from the curses of dimensionality and modeling. Many real-world problems however tend to be semi-Markov decision processes (SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable. Unfortunately for the average reward case, unlike the discounted reward case, the MDP does not have an easy extension to the SMDP. Examples of SMDPs can be found in the area of supply chain management, maintenance management, and airline revenue management. In this paper, we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion. We present the convergence analysis of the algorithm which shows that under certain mild conditions, which can be ensured within a simulator, the algorithm converges to an optimal solution with probability 1. We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon. The problem has a large scale, suffering from the curse of dimensionality, and hence it is difficult to solve it via classical methods of dynamic programming. Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry.
منابع مشابه
Adaptive Critics for Airline Revenue Management
number: 007-0058 Adaptive Critics for Airline Revenue Management Abhijit Gosavi Department of Industrial and Systems Engineering University at Buffalo, SUNY 317 Bell Hall, Buffalo, NY 14260 [email protected] POMS 18th Annual Conference Dallas, Texas, U.S.A. May 4 to May 7, 2007. Abstract We present an approximate dynamic programming (DP) technique, called adaptive critic, for solving an airli...
متن کاملRevenue Management Without Forecasting or Optimization: An Adaptive Algorithm for Determining Airline Seat Protection Levels
We investigate a simple adaptive approach to optimizing seat protection levels in airline revenue management systems. The approach uses only historical observations of the relative frequencies of certain seat-filling events to guide direct adjustments of the seat protection levels in accordance with the optimality conditions of Brumelle and McGill (1993). Stochastic approximation theory is used...
متن کاملModel-Building Adaptive Critics for Semi-Markov Control
Adaptive (or actor) critics are a class of reinforcement learning algorithms. Generally, in adaptive critics, one starts with randomized policies and gradually updates the probability of selecting actions until a deterministic policy is obtained. Classically, these algorithms have been studied for Markov decision processes under model-free updates. Algorithms that build the model are often more...
متن کاملA Genetic Algorithm for Choice-Based Network Revenue Management
In recent years, enriching traditional revenue management models by considering the customer choice behavior has been a main challenge for researchers. The terminology for the airline application is used as representative of the problem. A popular and an efficient model considering these behaviors is choice-based deterministic linear programming (CDLP). This model assumes that each customer bel...
متن کاملAIRLINE STOCHASTIC CAPACITY ALLOCATION BY APPLYING REVENUE MANAGEMENT
To formulate a single-leg seat inventory control problem in an airline ticket sales system, the concept and techniques of revenue management are applied in this research. In this model, it is assumed the cabin capacity is stochastic and hence its exact size cannot be forecasted in advance, at the time of planning. There are two groups of early-reserving and late-purchasing customers demanding t...
متن کامل